Domain - creating Constraints 1 Domain - creating Constraints

نویسندگان

  • Robert L. Goldstone
  • David Landy
  • Robert Goldstone
چکیده

The contributions to this special issue on cognitive development collectively propose ways in which learning involves developing constraints that shape subsequent learning. A learning system must be constrained to learn efficiently, but some of these constraints are themselves learnable. To know how something will behave, a learner must know what kind of thing it is. While this has led previous researchers to argue for domainspecific constraints that are tied to different kinds/domains, an exciting possibility is that kinds/domains themselves can be learned. General cognitive constraints, when combined with rich inputs, can establish domains, rather than these domains necessarily preexisting prior to learning. Knowledge is structured and richly differentiated, but its “skeleton” must not always be pre-established. Instead, the skeleton may be adapted to fit patterns of co-occurrence, task requirements, and goals. Finally, we argue that for models of development to demonstrate genuine cognitive novelty, it will be helpful for them to move beyond highly pre-processed and symbolic encodings that limit flexibility. We consider two physical models that learn to make tone discriminations. They are mechanistic models that preserve rich spatial, perceptual, dynamic, and concrete information, allowing them to form surprising new classes of hypotheses and encodings. Domain-creating Constraints 3 Learning requires constraints. Gold (1967) and Chomsky (1965) formally showed that there are too many possible language grammars to learn a language in a finite amount of time, let alone two years, if there are no constraints on what those grammars look like. In a related analysis, Wolpert (1996) showed that there is no such thing as a truly general and efficient learning device. To be an efficient learner, one must make assumptions about the kind of structure one is expecting to find. Allegorically, if you are trying to find your favorite pair of socks and you only know that they are somewhere in your enormously large sock drawer, it will take you an enormously long time to find them. Some constraints allow you to limit your search to particular regions. Knowing that your socks are somewhere in the drawer means that you need never look anywhere outside it. Other, softer constraints simply determine an order in which the hypothesis space is searched. The physical structure of the drawer and the opacity of your socks incline you to consider the top of the drawer before the layers underneath. When the constraints coincide with reality -your favorite pair of socks really is at the top – they can turn unsolvable problems into relatively simple ones. Many problems in cognitive science, such as language learning and scene interpretation, apparently involve the cognitive equivalent of an infinitely large sock drawer, and hence require powerful constraints. Psychologists have applied the formal results on the need for constraints for development and learning, concluding that different domains (including language, but also physics, biology, quantitative reasoning, social relations, and geometry) have their own special structures which must be exploited if learning is to be efficient (Spelke & Kinzler, 2007). Efficiently exploiting these kinds of structures entails having different Domain-creating Constraints 4 kinds of constraints for different domains. The specific nature of many of these domains, and the corresponding nature of their internal constraints, was detailed in the contributed articles to the 1990 special issue of Cognitive Science devoted to structural constraints on cognitive development. The current special issue of Cognitive Science could be considered a 20 anniversary homage to this previous special issue. The contributions to the 2010 issue are no less concerned with constraints than the 20 century issue. A deeper inspection of the current issue’s contents does, however, show an evolution in how cognitive developmentalists conceptualize constraints. The articles in the 1990 special issue tended to posit internal constraints that paralleled structural characteristics of entities in particular evolutionarily important domains. A couple of examples provide helpful reminders as to their general modus operandi. Spelke (1990) argued that infants are constrained to assume that objects follow smooth trajectories through space and time. This is an eminently reasonable assumption because objects do not typically pop into and out of existence spontaneously, but rather move smoothly and vary conservatively. E. Markman (1990) argued that children need constraints on word meanings in order to learn them in a reasonable amount of time. For example, children assume that a word refers to a whole object rather than part of the object, ceteris paribus. They assume that words refer to taxonomic kinds rather than thematically related objects. In addition, they assume that a word will refer to an unlabeled entity, which allows them to overcome the first two constraints if necessary. Keil (1990) argued that ontological knowledge is better described by a tree structure than by a set of arbitrarily overlapping clusters. Predicates must apply to an entire subtree of a hierarchy, thus preventing “M” structures. Children, internalizing this “M-constraint”, assume that if some things (like Domain-creating Constraints 5 mice) can fear but cannot be two hours long, and other things (like baseball games) can be two hours long but cannot fear, then there should not be still other objects that both fear and last two hours. These constraints all follow the same pattern of postulating an internal bias that fits well with an external exigency. By this approach, we are capable of apt and efficient cognition because our internal structures have evolved over millions of years to reflect external structures of importance for survival (Shepard, 1984). A common conclusion of this approach is that humans, or any other cognitive learning system, cannot be general learning devices or tabula rasas. We need to have constraints like these built into us. Furthermore, because different aspects of the world manifest different structures, we need to have different evolutionarily-bestowed constraints for different domains. Hence, Cosmides and Tooby (1992) compare the human mind to a “swiss army knife” of different tools that have each been adapted over evolutionary time to their task domain (see Twyman & Newcombe, 2010, for an extended discussion of this, and other, conceptions of modularity). The exciting possibility raised in various ways by the articles in the current special issue is that experience with a richly and diversely structured world can allow people to devise some of the constraints that they will then use to make learning more from the world more efficient. While some constraints are surely provided by evolution, others can be acquired during an organism’s lifetime, and are no less powerful for being learned. In fact, acquired constraints have the advantage of being tailored to an individual’s idiosyncratic circumstances. At a first pass, humans seem to live in the same, reasonably fixed world, suggesting that adaptation across generations would be most Domain-creating Constraints 6 effective. Indeed, many general environmental factors, such as color characteristics of sunlight, the position of the horizon, and the change in appearance that an approaching object undergoes, have all been mostly stable over the time that the human visual system has developed. However, if we look more closely, there is an important sense in which people face different environments. Namely, to a large extent, a person's environment consists of animals, people, and things made by people. Animals and people have been designed by evolution to show variability, and artifacts vary widely across cultures. Evolutionary pressures may have been able to build a perceptual system that is generally adept at processing faces (Bruce, 1998), but they could not have hardwired a neural system to be adept at processing an arbitrary face, say that of Barack Obama, for the simple reason that there is too much generational variability among faces. Individual faces show variability from generation to generation, and variability is apparent over only slightly longer intervals for artifacts, words, ecological environments, and animal appearances. Thus, we can be virtually positive that hand tools show too much variability over time for there to be a hardwired detector for hammers. Words and languages vary too much for there to be a hardwired detector for the written letter "A." Biological organisms are too geographically diverse for people to have formed a hardwired "cow" detector. When environmental variability is high, the best evolutionary strategy for an organism is to develop a general perceptual system that can adapt to its local conditions. These adaptations, once effected, act as constraints at different levels of specificity. When adapting to a single person such as Barack Obama, our early expectations may constrain how we interpret his future actions and appearances. Learned Domain-creating Constraints 7 constraints have far wider implications when they are distilled from experiences with many different objects. For example, Smith, Colunga, and Yoshida (2010) report earlier experiments that children extended a label by shape and texture when the objects were presented with eyes (signaling animacy), but extended the label by shape alone when the target and test objects were presented without eyes. Likewise, for toys, children learn that shape matters, whereas for foods, material matters (Macario, 1991; see discussion by Sloutsky, 2010). Some evidence that these biases are learned is indicated by results showing that laboratory training allows students to acquire some of these biases at an age before they normally emerge (Smith et al., 2002). As a second example, Madole and Cohen (1995) describe how 14 month old children learn part-function correlations that violate real-world events. 18 month old children do not learn these correlations, suggesting that children acquire constraints on the types of correlations that they will learn. As a final example, early language experience establishes general hypotheses about how stress patterns inform word boundaries (Jusczyk, Houston, & Newsome, 1999). Children are flexible enough to acquire either the constraints imposed by a stress-timed language like English or a syllable-timed language like Italian, but once they imprint on the systematicities within a language, they are biased to segment speech streams into words according to these acquired biases. In all these cases, constraints are acquired that subsequently influence how children will learn other materials from the same domains. Learning Overhypotheses A learning system must have constraints on hypothesis formation in order to learn concepts in a practical amount of time, but a considerable amount of flexibility is still Domain-creating Constraints 8 needed because different people face different worlds and tasks. Several of the articles in this special issue explore ways in which this dilemma can be resolved by making constraints themselves learnable. One way to think about this possibility is in terms of Nelson Goodman’s (1954) notion of an overhypothesis, a hypothesis of the form “All As are B” where A and B are generalizations of terms used in any other hypothesis that we are interested in (Kemp, Goodman, & Tenenbaum, 2010; Kemp, Perfors, & Tenenbaum, 2007). One might have hypotheses that all dogs have four legs, all storks have two legs, and all worms have no legs. Generalizing over both animals and leg number, one could construct an overhypothesis that “All animals of a particular type have a characteristic number of legs.” The power of such a hypothesis is that upon seeing only a single sixlegged beetle, one can infer that all beetles have six legs. Research indicates that adults employ probabilistic versions of overhypotheses such as these (Heit & Rubinstein, 1994). Kemp et al. (2010) present a quantitative, formal approach to learning overhypotheses. Their Hierarchical Bayesian Framework describes a method for learning hypotheses at multiple levels, as with the legged animals example above. Representations at higher levels capture knowledge that supports learning at the next level down. Learning at multiple levels proceeds simultaneously, with higher-level schemas acquired at the same time that causal models for multiple specific objects are being learned. This mechanism allows Kemp to accommodate, at least in spirit, the examples of constraint learning described in the previous section. Abstract knowledge supports causal learning involving specific objects, but critically, this abstract knowledge itself can be acquired by statistical learning. Accordingly, it provides a way of learning, rather than simply declaring by fiat, the abstract domains that will govern causal Domain-creating Constraints 9 inferences. Their schema-learning approach discovers causal types instead of stipulating them in advance. As an example, it learns that there are two types of blocks – ones that activate a machine and ones that do not, modeling experiments reported by Gopnik et al. (2004). Kemp et al.’s models have multiple levels of abstraction, and so there might be a level that learns, for example, that pens are reliably pen-shaped and buckets are bucketshaped, that these might both belong to a higher level that groups them together as artifacts, and at this level the constraint can be expressed that all artifacts have a characteristic shape whatever that shape is, thereby acquiring a shape bias for artifacts (Smith, Colunga, & Yoshida, 2010). Results like these suggest that association learning devices are crucially undervalued if we only focus on token-to-token associations. A child seeing a penguin is not just learning that penguins are black and white, but is also learning about relations between coloration, shape, behavior, climate, diet, and so on, for birds, animals, and natural kinds. These kinds of type-to-type associations do not release us from a dependency on constraints. In fact, given the unlimited number of abstract descriptions applicable to an observed event, constraints become particularly important in directing us toward useful levels of abstraction. For example, in the model presented by Kemp (2010), the space of higher-level causal models must be fully specified ahead of time, leading to strong constraints on what kinds of abstract models the system can learn. Still, because constraints at this higher level will presumably apply to any novel particular domain, they are best seen as constraints on how experience drives the construction of special domains. 1 Kemp et al do not consider their models to be association learning, but the advantage of learning type-to-type relations can be exploited by non-Bayesian models. Domain-creating Constraints 10 The possibility of learning these type-to-type associations goes a long way toward severing the traditional connection between domain-specific constraints and innateness. Learning is not only caused by constraints, it causes constraints. Ample empirical evidence for the flexibility of constraints is provided by Sloutsky (2010). For example, he reports recent experiments showing that people are highly flexible in attending to different features in different micro-contexts (Sloutsky & Fisher, 2008). In one context, shape is relevant, and in another context color is relevant. When a context is reinstated, people selectively weight the contextually relevant dimension. Impressively, this is achieved even with as minimal a manipulation of context as screen location and background color. Other related demonstrations have shown that people will selectively attend to different stimulus dimensions as a function of contextual cues that are provided by the features of the stimuli themselves (Aha & Goldstone, 1992). These manipulations of context fall short of genuine domains, but in some ways, the minimalism of the contextual manipulations is a strength. If people can learn to attend to different properties with arbitrarily created and minimally different contexts, then certainly domains as different as geometry and social relations would have considerably more internal structure that could be leveraged to self-organize a division between them. Sloutsky suggests that contexts can be induced through a compression–based learning system even before a selection-based learning system has come on-line in an organism’s development. This is particularly so for “dense” categories in which different dimensions are highly correlated with each other. In practice, selection and compression will typically work in tandem, by creating categories that ignore some features (via Domain-creating Constraints 11 selection) while at the same time creating compact representations (via compression) that represent an assembly of co-occurring features by a centroid (Love, Gureckis, & Medin 2004). One of the reasons why compression often seems to precede selection for natural categories is that selection requires that a categorizer has first differentiated their world’s objects into dimensions. In some cases, early developed perceptual systems serve to split an object into separate dimensions. However, in other cases, much later experience provides the impetus to differentiate otherwise fused dimensions (Goldstone & Steyvers, 2001). Experience informs not only contexts and objects as Sloutsky shows, but also the very descriptions along with the objects are encoded. Statistics from the world can clump situations into contexts, objects into categories, and parts of an object into features. Each of these clumps, once established, influences future learning. Focusing on the development of infant visual perception, Johnson (2010) gives several compelling examples of learning to see as a constraint-creating activity. He provides evidence that infants originally see their world in disconnected fragments, and that exposure to faces and objects is necessary for infants to eventually come to see entities like these as coherent. In one reported paradigm, more 6than 4-month-old infants show anticipatory eye movements that are initiated before a ball emerges from behind an occluder. This suggests that spatiotemporal completion strengthens during this two-month period. Once the infant learns to correctly anticipate where an object will be, they are better able to look in the right place to extract more information about the object. In this fashion, learning begets still more learning. A large part of this rich-get-richer effect stems from the role that learning has in creating oculomotor patterns that Domain-creating Constraints 12 appropriately constrain future information acquisition, and consequently learning. Johnson describes another excellent example of this dynamic in the work of Needham and Baillergeon (1998). Exposing infants to single or paired objects tends to lead the infants to parse subsequent events in terms of these familiarized configurations. Infants initially exposed to a cylinder abutting a rectangular box showed relatively long looking times, suggesting surprise, if one of the objects subsequently moved separately from the other. Consistent with many of the results described by Johnson, infants are surprisingly adept at adapting their perceptual systems to statistical regularities in their environment. As their visual systems become tailored to their world, they become constrained to see their world in terms of the regularities they have extracted. However, rather than viewing these acquired constraints as limiting perceptual abilities, it is more apt to view these constraints as permitting the infant to see a coherent and well-behaved world (Medin et al., 1990). Active construction of entities and kinds Thus far, the argument has been that any organism that would learn efficiently needs to have constraints that apply to learning particular domains, but at least some of these constraints are learnable. Different constraints can be simultaneously learned for different contexts, object classes, modules, and domains because entities in the world naturally form recognizable clumps. Psychology is still important because the natural world can be carved into domains in many different ways depending on needs and goals. These goals shape the kinds of clumps that will be formed, but this is different from claiming that the clumps are pre-formed. Well understood mechanisms of selfDomain-creating Constraints 13 organization allow modules to be constructed for classes of objects based upon their constraints (Elman et al., 1996). By carving nature at its joints, clusters are formed such that the entities within a class are similarly constrained. Furthermore, once formed, the clusters reinforce and emphasize the distinctions between the entities. Joints are carved into nature where they were already incipient, making the joints sharper still (Lupyan, 2005). We do not need to start with domain-specific constraints. The specific domains can emerge from more domain-general principles of association, contingency detection, statistical learning, and clustering. The current issue’s articles propose a second way in which constraints are actively constructed rather than fixed. In particular, another recurring theme is that people play an active role in creating the entities to be learned. This theme is perhaps clearest in Chater and Christiansen’s (2010) arguments that language speakers shape their language over generations in ways that make it more easily learned by others. Rather than language learning consisting of the acquisition of fixed structures in a natural, linguistic world, the task confronting language learners is typically one of “C-Learning” – simply learning to coordinate with other individuals. This is a much easier task as long as the learner exists in a milieu in which the other individuals are generally configured similarly to the learner. On this view, languages are evolving to be learnable, at the same time that people are evolving to learn language. Language evolution assumes particular importance in this view, because languages change at a much faster rate than do genes. Certainly individuals still need to acquire their indigenous language, but this will be a language that has been rapidly evolved so as to be efficiently learnable by people with general perceptuo-motor, communicative and cognitive constraints. The premise that Domain-creating Constraints 14 languages can evolve relatively quickly is supported by documented reports that Nicaraguan sign language emerged in as little as three decades within a community of deaf children with little exposure to established languages (Senghas, Kita, & Özyürek, 2004). Casting language as C-learning does not trivialize its difficulty. If language were purely a problem of coordination, it could be solvable by creating a very simple language containing only one word. But a language evolves under several distinct selection pressures, in addition to learnability. A language should be easily comprehended once learned, so ambiguities and confusions may make a language less successful, even if easily learned. Most important, a language must have sufficient power to express rich and structured thoughts. Language evolution is thus not completely untethered. Nonetheless, language does provide an excellent case study of a domain that is configured by general human constraints rather than existing as a preconfigured domain requiring language-specific constraints to acquire it. Though focusing on language acquisition rather than language evolution, Smith, Colunga, and Yoshida (2010) nonetheless echo several of the themes raised by Chater and Christiansen. Smith et al. again point to ways in which language is shaped by general cognitive constraints, including general attentional effects, the highlighting of novel cues that co-occur with novel outcomes, the illusory projection of cues that are associated with other cues but are not themselves present, and the construction of clusters reflecting correlated features (see also Sloutsky’s compression mechanism). They show that language, as it is constructed by a child in a particular environment, reaches back to affect how that environment is coded. Unlike English, Japanese makes no distinction Domain-creating Constraints 15 between count and mass nouns. However, giving Japanese children training with the kind of correlated linguistic cues that an English child might receive causes them to behave more as an English child might. In particular, they generalize names for solid things by shape and nonsolid things by material, rather than the less sharply delineated generalization pattern for an average Japanese child. Critically, this effect of linguistic training has a lasting influence on children even when the linguistic cues are no longer present. Language is instrumental in establishing categories like count, mass, and animate nouns. While scaffolded, in part, by language, these categories remain in place when the scaffold is removed. It is this dynamic that leads Smith et al. to argue that children are forming the domains via which they will organize their world, and language both reflects and guides these acts of creative construction. Applying this approach to domain-specificity, many of this issue’s articles pursue the possibility that domain-general constraints are sufficient to produce what eventually become, or simply appear to be, domain-specific constraints, including Chater and Christiansen (2010), Newcombe (2010), Sloutsky (2010), and Smith, Colunga, & Yoshida (2010). Recall that the basis for the traditional link between domain-specificity and constraints lies in the idea that there are innate constraints associated with each kind of stuff – each domain, or area of “core knowledge” (Spelke & Kinzler, 2007). Empirical evidence indicates that we do not need to start with domain-specific constraints. The specific domains can come from more domain-general principles. Chater and Christiansen (2010) describe the existence of general cognitive routines for processing sequential information used in natural language reading and statistical learning; language is not as special as might have been believed. Some of the domain general processes that Domain-creating Constraints 16 they single out include encoding, organization, and production of temporally unfolding events. These processes are useful for finding structures in language and visual sequences alike, and positing domain-general constraints helps to explain patterns of correlation between language and visual temporal processing. Once it has been learned, language is highly constrained, but if we just look at the eventual constrained forms, it is easy to forget where they came from. Likewise, for the domain of spatial navigation, Twyman and Newcombe (2010) argue that people, old and young alike, integrate across multiple cues, including featural cues, not just a single geometric module. The apparently constrained nature of childrens’ navigation gives way to broader domain-general processes. Even children can use curtain color, a cue not considered to be part of the “geometric module,” when the room is large and they have experience with the task. We need to think about how cues are combined and integrated, which is both the bane and boon of domain-general processes. Information has to be combined across multiple sources, a process that is sometimes complex. However, the complexity is often more than justified by the benefits conferred by mixing expert systems, and by having these expert systems simultaneously train each other (de Sa & Ballard, 1998). Adding Skeletons to Flesh An important plank of the new approach to developmental constraints espoused in this issue’s pages is that developing children learn to organize their world into the categories that they will use to guide their inferences. This New School of Constraints agrees with the Old School on the basic principle that different properties are important Domain-creating Constraints 17 for different domains. For example, Gelman and Markman (1986) present evidence that children generalize from biological properties to objects with the same superordinate name but a different appearance. For the biological property “cold blooded,” children extend more inferences from triceratops to brontosauruses than to rhinoceroses. However, for physical properties (like weighing 1 ton), they generalized more to the rhinoceros (which resembled a triceratops) than brontosaurus. Heit and Rubinstein (1994) find that adults are as flexible as children, for example generalizing an anatomical property more from chickens to hawks than from tigers to hawks, but generalizing feeding and predation properties more from tigers to hawks. Earlier, Nisbett et al. (1983) showed that reading about just one member of a tribe with a certain skin color makes people think that all of the tribe members have that skin color, but they are not as profligate with their inductions about the generality of obesity in the tribe upon seeing only one obese member. One possible conclusion from these kind of studies is that both children and adults come to their world already having broken it down into kinds of things, and this is critical because in order to know how something will behave, one needs to know what kind of thing it is. Domains provide the skeleton on which to hang knowledge. As R. Gelman (1990) writes, “I find it helpful to think of a skeleton as a metaphor for my notion of first principles. Were there no skeletons to dictate the shape and contents of the bodies of the pertinent knowledge, then the acquired representations would not cohere” (p. 82). In referencing skeletons, Gelman clearly has in mind an a priori structure on which to hang experiential knowledge. However, to the extent that we, the authors, find skeletons to be an apt metaphor for knowledge, it is only when we reflect on the fact that skeletons themselves are not a priori structures, but rather unfold with a developmental Domain-creating Constraints 18 process themselves, and are molded by need. The tennis player John McEnroe’s right hand is significantly larger than his left hand, because his skeleton and musculature are not a priori givens. In general, bone mineral content is greater in the dominant arm of professional tennis players than in their contra-lateral arm, but not so for a control group (Calbert et al., 1998). Tennis players’ literal skeletons have adapted to fit their tennis requirements. Neural network models provide working examples of skeletons forming because of the inputs provided to them. Bernd Fritzke’s (1994) Growing Neural Gas model provides a compelling example of this. When inputs are presented, edges are grown between nodes that are close to the input, and new nodes are created if no node is sufficiently close to the input. The result is a skeleton that can aptly accommodate new knowledge because it was formed exactly in order to accommodate the knowledge. This skeleton-creating approach appears also in “Rethinking innateness” (Elman et al., 1996), where one of the primary ideas is that the existence of modularity does not implicate innateness. Modules can be learned because systems can self-organize themselves to have increasingly rich and differentiated structure. Computational modeling suggests that the eventual specialization of a neural module often belies its rather general origins (Jacobs, Jordan, & Barto, 1991). Very general neural differences, such as whether a set of neurons has a little or a lot of overlap in their receptive fields, can cause the two populations of neurons to spontaneously specialize for handling either categorical or continuous judgment tasks, or snowball small initial differences into large-scale “what” versus “where” visual systems (Jacobs & Jordan, 1992). At a higher level of abstraction, self-organizing neural network models have been proposed that account for how originally undifferentiated concepts become Domain-creating Constraints 19 differentiated and increasingly structured with development (Rogers & McClelland, 2008). Without belaboring the details of these models, there are a sufficient number of examples of skeleton-creating working models to believe that to know how something will behave, one needs to know what kind of thing it is, but that these kinds can emerge through the progressive differentiation of objects into domains with experience. Getting Physical about Constraint Learning To our mind, the articles in this special issue offer true advances in our understanding of cognitive constraints as developing over a person’s lifetime. However, in the interest of urging the field to not rest on its laurels, we wish to point out that the models that have been presented to make this point strike us as incorporating input representations that are highly pre-processed and symbolic. This modeling decision hampers the models from producing as novel constraints and representational capacities as they otherwise might. First, let us give some examples of what we feel are modeling choices that constrain too tightly the kinds of constraints that can be induced. Kemp et al.’s (2010) model succeeds in simultaneously learning object-level causal models, category-level causal models, and the categories that occupy the upper levels. However, to achieve these, the authors assume that the learner already has divided the world into domains (e.g. people and drugs) and events (e.g. ingestion and headache events). Furthermore, it is prebuilt to learn causal models that relate the ingestion of drugs to headaches. Finally, actual events like “has headache” are pre-coded as atomic, non-decomposable symbols. There is no perceptual apparatus that grounds objects, events, or causal relations, and hence no way to adapt perceptual processes to establish new kinds of entities. To be fair, Domain-creating Constraints 20 the authors admit all of these constraints, and gesture to some possible ways of adding flexibility to their model. To take a second example from a different modeling tradition, Rogers and McClelland’s (2004, 2008) neural network model is an attempt to understand how domain-specific knowledge emerges solely from general learning principles. In this sense, it fits well within the current articles’ leitmotif that pre-established constraints on pre-established domains is not required to account for the eventually structured form of cognitive representations. The particular structures that their PDP model, trained by back-propagation, acquires are: progressively differentiated concepts, coherent categories characterized by clusters of co-occurring features, conceptual reorganization over time, and domain-specific attribute weighting. Their system takes as input statements like “Canaries can fly” and “Canaries can grow” and creates emergent and shifting clusters for different kinds of animals and attributes. From the perspective of learning constraints, though, we feel that the input representations are limiting. Single nodes are dedicated to each element in a proposition, such as “Canaries,” “Can,” and “Fly.” These nodes are not connected to a physical world via a perceptual system, so once again, there is no perceptual system to adapt. A natural response to our objection is “One has to start somewhere. Progress can be made on constraint learning without accounting for the organism’s relation to its world.” We do not deny that some progress can be made, but at the same time we feel that the most compelling examples of learning new objects and domains come exactly from situations where the organism’s embedding in the world is rich, high-bandwidth, and dynamic (Beer, 2008). In this respect, we are in agreement with Johnson (2010) and Domain-creating Constraints 21 Smith, Colunga, and Yoshida (2010). Case studies of novel constraint-generation in physical systems To show the kinds of novel objects and constraints that can emerge in situated systems, we will consider two working, physical devices, without asserting that they are formal models of learning. Figure 2 shows the first physical model, developed by Gordon Pask (1958), which features electrodes immersed in a dish containing a ferrous sulfate solution. Passing current through the electrodes caused dendritic metallic filaments to grow as precipitates from the fluid. Ferrous filaments could be adaptively grown to make the system sensitive to sounds. Early on the system could only detect the presence or absence of sounds, but once filaments grew that joined electrodes and changed electrical conductance, the device was able to discriminate two frequencies. The conducting filament pathway is shaped by exactly the vibrational perturbations that it detects, and how it detects the perturbations is changed by its growth. This device has the capacity to represent things, like the difference between tones of 50 and 100 cycles/seconds, which it was not originally able to represent. This kind of model provides a compelling existence proof for a system that creates its own constraints – constraints that were not originally there before a certain physical connection was made. It is a device that, when (literally) immersed in the proper environment, develops its own concept of what is relevant. Our second example is a physical device meta-designed to accomplish, perhaps coincidentally, a similar tone discrimination task. For this task, Thompson, Layzell, and Zebulum (1999) employed a FPGA (Field-Programmable Gate Array) containing a 10 X Domain-creating Constraints 22 10 array of programmable logic components called “logic blocks.” The logic blocks consist of multiplexers that act like switches to determine input-output relations between blocks. A computer is used to configure the multiplexers by sending them a stream of bits, thereby causing the multiplexer to physically instantiate a particular electronic circuit on the chip. FPGAs are thus integrated circuits with hardware that can be configured to specify how the logic blocks are connected to each other. Thompson et al. used a genetic algorithm to randomly alter the hardware of the FPGA, and then tested to see how well the resulting, automatically configured FPGA could accomplish the task of discriminating between 1 KHz and 10kHZ square wave tones. After 5000 generations, the FPGA could solve the task well. The best performing hardware design is shown in Figure 3. In this figure, the upper-left panel shows the entire set of connections among the logic blocks. However, because a random genetic algorithm was employed, there is no guarantee that this FPGA is actually a coherent electronic circuit at all. In fact, it was not, and some of the logic blocks are not part of the logical circuit of the FPGA that classifies a tone as high or low. These pruned blocks are not part of any connected pathway that connects to the output units. Pruning these blocks yields the upper-right panel. The remaining blocks are part of the logical circuit and could influence the FPGA’s output, but that is no guarantee that they do. The authors clamped the values of each of the full 10 X 10 set of blocks – both those that were and were not included in the pruned circuit -to determine if a block did ever influence outputs. Removing the largest set of blocks that can be clamped without affecting performance from the electrical diagram results in the lower panel of Figure 3. Comparing the two methods of paring down the full 10 X 10 array of blocks reveals a surprising pattern. Some blocks (shown Domain-creating Constraints 23 in gray) that were not part of logical circuit of the evolved solution are nonetheless important for the FPGA’s success at the task. When they are clamped to constant but random selected values, performance suffers, hence they influence the circuit’s behavior and quantifiable performance. Furthermore, when only the logical circuit is implemented in a new, nominally identical FPGA, it no longer solves the task as well as the original circuit did. Likewise, a digital simulation of the evolved FPGA’s circuit did not produce correct behavior. This apparent contradiction is reconciled by understanding that although the FPGA is a digital chip, it also has analog properties as a physical device. Conventional digital circuit design makes the assumption that a given block will output only one of two states at any give time, when actually there are transition periods. The “digital assumption” has the prominent advantage that it allows us to think in terms of a logic algebra. However, the multiplexer switches work according to laws of semiconductor physics. The circuit changes as a real-time, continuous valued electrical system. For some of the cells that are part of the functional, but not logical, circuit, inputs are routed into them from the active circuit, but their outputs are not used by the active network. These cells influence the timing of the signals routed through or near them. If the configuration of the gray cells is changed, this affects the capacitance of these routes, and hence the time delays for signals to travel along them. These signals include not only on/off states, but transitional values between these two states which are normally considered the only possible states in formal electronics. Thus, the cells that are not part of the logical circuit can still change the timing of the rest of the circuit by influencing Domain-creating Constraints 24 their transient states, and the tone discrimination task is highly dependent on timing, so these changes are functionally important. Reliably traumatic learning in physical systems With these two examples in mind, we can address a question core for the field of cognitive development: How can systems develop genuinely new cognitive capacities? One answer is simply that they cannot. Fodor (1980) argues that learning a new concept necessarily involves first hypothesis formation and then testing the formed hypothesis. Therefore, a person could not acquire a new concept unless they already had the ability to represent its hypothesis. Learning can determine whether the hypothesis is true, but the fundamental ability to represent the hypothesis cannot be learned. If a person can learn that a square is a four sided object with all angles equal and all sides equal, then it must be because the person already had the wherewithal to represent concepts like “four,” “angles,” and “equal.” A system can increase its representational power by “physical trauma” like a blow to the head, but not through formal inductive learning processes. Inductive learning does not increase a system’s representational “vocabulary” because mechanisms must already have been in place to express the “new” vocabulary, and so it has not been genuinely created. A related argument is presented by Kemp et al. (2010) in defending their approach from the criticism that their model does not learn or discover causal schemata, but rather only selects one schema from a pre-specified space of hypotheses. Kemp et al’s response is that from a computational perspective, every learner begins with a preDomain-creating Constraints 25 specified hypothesis space that represents the abstract potential of the learner. This hypothesis space includes all reachable states of knowledge given all possible empirical inputs. Both Fodor’s and Kemp et al’s arguments take the form: True novelty is impossible because all hypotheses (or their components, for Fodor) must exist to be (eventually) selectable. Pask’s device tidily points out the simplistic nature of this argument. There may be a sense in which all discriminations that the device can learn are present in the device when it is first constructed. However, that is only the same trivial sense in which all organs of all life forms that currently inhabit the planet were present in the earliest bacteria. This trivializes the physical changes that allow Pask’s device to represent distinctions between sounds that it was originally unable to make. These physical changes may be dismissed as “trauma” but they are nonetheless highly systematic, much as are the actual physical changes in the brain that lead to long-term potentiation across the synapse between two neurons that are co-stimulated. The physicality of the change in Pask’s device makes it instructively clear that at one point in its development, prior to the existence of a conductive filament that has grown to connect two electrodes, it is incapable of making a sound discrimination, but that after the physical change has transpired, it is capable. We agree with Kemp et al. (2010) that a formal model of learning must originally contain all the hypotheses it will eventually be able to entertain. It is important to 2 The authors also argue that this computational-level response, though correct, could also be supplemented by algorithmic-level accounts that provide a process model for creating hypotheses. The authors do not provide such an account. We are arguing for far greater attention to algorithmic level, and even implementational level, process accounts of how new hypotheses are constructed in the first place. Attention to these levels is what allows truly emergent novelty to arise in a system. Domain-creating Constraints 26 remember, though, that formal models are not the same thing as working, rigorous, and replicable models. Thompson’s FPGA and Pask’s device may well be models of the latter kind, and they offer the kind of physical models that may well be needed to yield genuine novelty with development. Thompson’s evolved FPGA clearly shows the disadvantage of honing too closely to a formal algebra. The formal model systematically eliminates the possibility of discovering solutions to the sound discrimination task that are outside of the digital circuit framework. When the evolved circuit is reduced to its formal form, it no longer solves the task. A physical device such as Thompson’s FPGA can have more than one appropriate formal idealization. No doubt there is a more elaborate formal logic that captures via explicit representation the temporal properties of the FPGA, just as a sufficiently complex formal model could capture the chemical properties of Pask’s device. Such a richer model could well solve the tasks these physical devices do, and all the capabilities of these devices would then be implicitly contained in that formal description. But, crucially, in order to design that formal model, one would have to know just which physical properties mattered to the behavior of interest. From this perspective, to observe that all of the possible hypotheses that can be entertained by a formal system are already implicit in the definition of the system is akin to noting that by the time one can build a satisfying formal model, one must already know how the real system works. On the other hand, a physical model or a model with rich, high-bandwidth connections to an external physical environment can exhibit properties that are not contained in any preexisting conceptualization. The moral, for us, is that we want our models, like our children, to be sufficiently flexible that they can surprise us with solutions that we had Domain-creating Constraints 27 not anticipated by our formal analyses. Constraint generation by interacting with other physical systems Physical interactions with an external environment may also play a role in constructing and constraining a hypothesis space. Several recent models of high-level cognitive activities depend on the routine application of deictic representations (cf. Agre & Chapman, 1987; Ballard et al. 1995). For instance, Patsenko and Altmann (2009) describe a model of the process of solving the Towers of Hanoi. This is a quintessentially high-level task, but Patseno and Altmann’s model performs remarkably little reasoning. Instead, a quite simple control system defines a method for manipulating visual pointers and physical objects. Assuming a reliably stable external environment, this control system, which includes rules for updating referential representations, suffices to perform the Towers of Hanoi task in a manner that closely matches human behavior. Landy, Jones, Goldstone (2008) suggested similarly that deictic representations might drive human syntactic judgments in formal arithmetic. To the extent that reasoning depends on deictic representations of this kind, the ability of an agent to maintain and coordinate multiple referential symbols provides a natural constraint on the kinds of hypotheses that can be maintained. There is little reason to think of the maintenance of these referential pointers as a purely internal, representational process. Rather, to the degree that hypotheses are represented via interactions with external representations, maintenance of those hypotheses depends both on internal resources, and on various physical properties of the interacting systems, the external environment, and the ability of the agent to manipulate that environment (Kirsh, Domain-creating Constraints 28 1995). In this case, the constraints that limit the hypotheses that can be constructed are encoded in the physical structure of the environment and agent-environment interactions. In some cases, hypotheses themselves may be built out of environmental components, with the result that constraints on one’s ability to construct physical analogs will limit and constrain the hypothesis space that can be constructed. 17 Century physicists generated their hypotheses about physical phenomena by mapping them onto known physical mechanisms (Bertolini Meli, 2006). Nersessian (2010) suggested that the physical models engineering scientists construct often serve a role in the generation and representation of hypotheses: reasoners construct a physical model, which serves both as an object of study, and as a repository for theory. Given that the physical instantiation is not well understood, making a prediction from the theory may require consulting – running – the model. When hypotheses are partially external constructs, our ability to form hypotheses will be constrained not just by the limitations of a predefined hypothesis space intrinsic to an agent, but also by our practical ability to build particular physical models that instantiate theories. For example, if a particular neural network is too complex to build, or would take to long to run on an available computer, one may well simplify it. The physical, external constraints limit the available hypothesis space in a manner neither fixed nor internal, but nevertheless quite restrictive. Conclusions and caveats to the novelty that physical systems permit To the extent that concepts like squares and angles are rooted in one’s ability to produce and perceive physical models of squares and angles, one’s hypothesis space is at least softly constrained by the ability to coordinate fingers, limbs, and counting procedures. No Domain-creating Constraints 29 less than scientists, children are situated, physical systems, and their physical presence is critical to their ability to develop their own constraints, and to increase their own representational capacities. This does not mean that we eschew computational and mathematical models of cognitive development because of their lack of physicality. However, we do recommend efforts to move beyond models, be they connectionist or Bayesian, that severely constrain the hypothesis space (Landy & Goldstone, 2005). We advocate models that preserve enough rich spatial, perceptual, dynamic, and concrete information for surprising new classes of hypotheses and encodings to emerge. We believe the hard problem for cognitive development will not be selecting from hypotheses or creating associations between already delineated elements, but rather constructing hypothesis spaces and elements in the first place. We have argued for interpenetrations between people and their environments, and between the physical and functional descriptions within learning devices such as people. Furthermore, we have argued that these interpenetrations are crucial in developing learning systems that create genuinely new hypotheses, not just permutations of preestablished symbols. However, a critic might point out that these interpenetrations make for fault-prone and noise-intolerant devices. Thompson et al. (1999) not withstanding, most current electronics are designed to behave digitally precisely to provide tolerance to superficial variation in voltage signals that are irrelevant to the critical information. Even “squishy” organic life as we know it owes its longevity to a genetic code that closely approximates a digital code consisting of nucleotides and codons. Complex cellular machinery is dedicated to assuring that the code is relatively inert, and is protected from many contextual influences (Rocha & Hordijk, 2005). It is reasonable to think that our Domain-creating Constraints 30 cognitive system benefits from the same strategy of developing reusable, quasi contextindependent codes (Dietrich & Markman, 2003). Much of the benefit of discrete codes is exactly that they are not buffeted about by extraneous contextual influences. We agree that systems that behave reliably often have evolved to have representations that are mostly sealed off from lower or outside pressures. Simon (1969) called such systems with multiple, encapsulated levels “nearly decomposable,” and we agree with their importance in cognitive systems, but wish to equally emphasize the importance of the qualifier “nearly.” If Thompson et al. (1999) had designed electrically “cleaner” circuits with proper electrical shielding around cells, then their system would not have been able to evolve the same solutions to the sound detection problem based on influences of electric flow on nearby capacitances. Analogous advantages are found for systems that coordinate with their physical environment to solve problems that they could not otherwise solve. The advantage of creating new representations by recombining existing codes is too powerful to forego. However, systems are still more flexible when this combinatorial flexibility is extended by interpenetrations across levels and systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantitative Non-diagonal Regulator Design for Uncertain Multivariable System with Hard Time-domain Constraints

In this paper a non-diagonal regulator, based on the QFT method, is synthesized for an uncertain MIMO plant whose output and control signals are subjected to hard time-domain constraints. This procedure includes the design of a non-diagonal pre-controller based on a new simple approach, followed by the sequential design of a diagonal QFT controller. We present a new formulation for the latter s...

متن کامل

ISOGEOMETRIC TOPOLOGY OPTIMIZATION OF STRUCTURES CONSIDERING WEIGHT MINIMIZATION AND LOCAL STRESS CONSTRAINTS

The Isogeometric Analysis (IA) is utilized for structural topology optimization  considering minimization of weight and local stress constraints. For this purpose, material density of the structure  is  assumed  as  a  continuous  function  throughout  the  design  domain  and approximated using the Non-Uniform Rational B-Spline (NURBS) basis functions. Control points of the density surface are...

متن کامل

Generating Individualized Hypermedia Applications

This paper presents a process for creating adaptive, individualized, instructional hypermedia applications. The process (APHID-2) builds on a prototypical system (APHID) that semiautomatically generates customized, instructional hypermedia applications. APHID ensures that good design principles are followed, both for the hypermedia application and for the interface that presents the hypermedia ...

متن کامل

Linear Objective Function Optimization with the Max-product Fuzzy Relation Inequality Constraints

In this paper, an optimization problem with a linear objective function subject to a consistent finite system of fuzzy relation inequalities using the max-product composition is studied. Since its feasible domain is non-convex, traditional linear programming methods cannot be applied to solve it. We study this problem and capture some special characteristics of its feasible domain and optimal s...

متن کامل

MULTI-OBJECTIVE OPTIMIZATION WITH PREEMPTIVE PRIORITY SUBJECT TO FUZZY RELATION EQUATION CONSTRAINTS

This paper studies a new multi-objective fuzzy optimization prob- lem. The objective function of this study has dierent levels. Therefore, a suitable optimized solution for this problem would be an optimized solution with preemptive priority. Since, the feasible domain is non-convex; the tra- ditional methods cannot be applied. We study this problem and determine some special structures related...

متن کامل

On Sampling from Multivariate Distributions

Let X1, X2, . . . , Xn be a set of random variables. Suppose that in addition to the prior distributions of these random variables we are also given linear constraints relating them. We ask for necessary and sufficient conditions under which we can efficiently sample the constrained distributions, find constrained marginal distributions for each of the random variables, etc. We give a tight cha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010